HapTree-X: An Integrative Bayesian Framework for Haplotype Reconstruction from Transcriptome and Genome Sequencing Data
نویسندگان
چکیده
Identifying phase information is biomedically important due to the association of complex haplotype effects, such as compound heterozygosity, with disease. As recent next-generation sequencing (NGS) technologies provide more read sequences, the use of diverse sequencing datasets for haplotype phasing is now possible, allowing haplotype reconstruction of a single sequenced individual using NGS data. Previous haplotype reconstruction studies have ignored differential allele-specific expression in whole transcriptome sequencing (RNA-seq) data; however, intuition suggests that the asymmetry in this data (i.e. maternal and paternal haplotypes of a gene are differentially expressed) can be exploited to improve phasing power. In this paper, we describe a novel integrative maximum-likelihood estimation framework, HapTree-X, for efficient, scalable haplotype assembly of an individual genome using transcriptomic and genomic NGS read datasets, which makes use of differential allele-specific expression. Our advance includes the first method for haplotype assembly that uses differential expression, newly allowing the use of reads that cover only one SNP. We evaluate the performance of HapTree-X on real sequencing read data, both transcriptomic and genomic, from NA12878 (1000 Genomes Project and Gencode) and demonstrate that HapTree-X increases the number of SNPs that can be phased and sizes of phased-haplotype blocks, without compromising accuracy. We prove theoretical bounds on the precise improvement of accuracy as a function of coverage which can be achieved from differential expression-based methods alone. Thus, the advantage of our integrative approach substantially grows as the amount of RNA-seq data increases.
منابع مشابه
HapTree: A Novel Bayesian Framework for Single Individual Polyplotyping Using NGS Data
As the more recent next-generation sequencing (NGS) technologies provide longer read sequences, the use of sequencing datasets for complete haplotype phasing is fast becoming a reality, allowing haplotype reconstruction of a single sequenced genome. Nearly all previous haplotype reconstruction studies have focused on diploid genomes and are rarely scalable to genomes with higher ploidy. Yet com...
متن کاملTranscriptome Sequencing of Guilan Native Cow in Comparison with bosTau4 Reference Genome
RNA-sequencing is a new method of transcriptome characterization of organisms. Based on identity and relatedness, there are large genetic variations among different cattle breeds. The goal of the current study was to sequence the transcriptome of Guilan native cow and compare with available reference genome using RNA-sequencing method. Blood samples were collected from 14 Guilan native cows and...
متن کاملClustering of Short Read Sequences for de novo Transcriptome Assembly
Given the importance of transcriptome analysis in various biological studies and considering thevast amount of whole transcriptome sequencing data, it seems necessary to develop analgorithm to assemble transcriptome data. In this study we propose an algorithm fortranscriptome assembly in the absence of a reference genome. First, the contiguous sequencesare generated using de Bruijn graph with d...
متن کاملSeqFold: genome-scale reconstruction of RNA secondary structure integrating high-throughput sequencing data.
We present an integrative approach, SeqFold, that combines high-throughput RNA structure profiling data with computational prediction for genome-scale reconstruction of RNA secondary structures. SeqFold transforms experimental RNA structure information into a structure preference profile (SPP) and uses it to select stable RNA structure candidates representing the structure ensemble. Under a hig...
متن کاملO-36: Genome Haplotyping and Detection of Meiotic Homologous Recombination Sites in Single Cells, A Generic Method for Preimplantation Genetic Diagnosis
Background: Haplotyping is invaluable not only to identify genetic variants underlying a disease or trait, but also to study evolution and population history as well as meiotic and mitotic recombination processes. Current genome-wide haplotyping methods rely on genomic DNA that is extracted from a large number of cells. Thus far random allele drop out and preferential amplification artifacts of...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB
دوره 9029 شماره
صفحات -
تاریخ انتشار 2015